Discretized Multinomial Distributions 
and Nash Equilibria in Anonymous Games 
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Abstract 

We show that there is a polynomial-time approximation 
scheme for computing Nash equilibria in anonymous games 
with any fixed number of strategies ( a very broad and im- 
portant class of games), extending the two-strategy result 
of HI 6V . The approximation guarantee follows from a prob- 
abilistic result of more general interest: The distribution of 
the sum of n independent unit vectors with values ranging 
over {e 1; . . . , e^}, where is the unit vector along dimen- 
sion i of the k-dimensional Euclidean space, can be approx- 
imated by the distribution of the sum of another set of inde- 
pendent unit vectors whose probabilities of obtaining each 
value are multiples of - for some integer z, and so that 
the variational distance of the two distributions is at most e, 
where e is bounded by an inverse polynomial in z and a 
function ofk, but with no dependence on n. Our probabilis- 
tic result specifies the construction of a surprisingly sparse 
e-cover — under the total variation distance — of the set of 
distributions of sums of independent unit vectors, which is 
of interest on its own right. 



1 Introduction 

The recent results implying that the Nash equilibrium 
is an intractable problem |[T9l . even in the two-player 
case 0T), have directed the interest of researchers towards 
algorithms or complexity results for special cases ll25l [341 
CD E8HH and approximation algorithms lf32l [3TI l20l [241I2TI 
[lOl [39 , 1 8 1 , and the following has emerged as the main open 
question in the area of equilibrium computation: Is there a 
PTAS for the Nash equilibrium^ 
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'it is shown in 1121 that an FPTAS is no more likely than an exact 
solution. 



In this paper we make progress on this problem, focusing 
on a very broad and common class of games called anony- 
mous games IS). A game is anonymous if the utility 
of each player depends not on exactly which other player 
chooses which strategy; instead, it only depends on the 
number of other players that play each strategy (that is, it is 
a symmetric function of the strategies played by other play- 
ers). Anonymous games are a much more general class than 
the symmetric games (known to be solvable in polynomial 
time when the number of strategies is fixed [35]), in which 
all players are identical. Many problems of interest in com- 
putational game theory, such as congestion games, partici- 
pation games, voting games, and certain markets and auc- 
tions, are anonymous. Anonymous games have also been 
used for modeling certain social phenomena JHJ . Since in 
anonymous games a player's utility depends on the parti- 
tion of the remaining players into strategies, such games are 
a rare case of multiplayer games that have a polynomially 
succinct representation — as long as the number of strate- 
gies is fixed. Our main result is a PTAS for such games. 
(However, it should be noted that it is not known whether 
this special case of the Nash equilibrium problem is PPAD- 
complete, and so even an exact algorithm may be possible.) 

Our PTAS extends to several generalizations of anony- 
mous games, for example the case in which there are a few 
types of players, and the utilities depend on how many play- 
ers of each type play each strategy; and to the case in which 
we have extended families (disjoint graphical games of con- 
stant degree and with up to logarithmically many players, 
each with a utility depending in arbitrary, possibly non- 
anonymous, ways on their neighbors, in addition to their 
anonymous, possibly typed, interest on everybody else). Es- 
sentially any further extension leads to intractability. 

Algorithmic Game Theory aspires to understand the In- 
ternet and the markets it encompasses and creates, and 
therefore it should focus on multi-player games. We believe 
that our PTAS is a positive algorithmic result spanning a 
vast expanse in this space. However, because of the tremen- 
dous analytical difficulties detailed below, our algorithm is 



not practical (as we shall see, the number of strategies ap- 
pears, exponentially, in the exponent of the running time). 
It could be, of course, the precursor of more practical algo- 
rithms (in fact, such an algorithm for the two-strategy case 
has been recently proposed ifTSIl ). But, more importantly, 
our algorithm should be seen as compelling computational 
evidence that there are very extensive and important classes 
of common games which are free of the negative implica- 
tions of the complexity result in |[T9l . 

The basic idea of our algorithm is extremely simple and 
intuitive (and in fact it had been noted in the past |29|): 
Since we are looking for mixed strategies (probability dis- 
tributions, one for each player, on the set of strategies) that 
are in equilibrium, we restrict our search to probability dis- 
tributions assigning to the strategies probabilities that are 
multiples of a fixed fraction, call it -, where z is a large 
enough natural number. We call this process discretization. 
We can then consider each discrete probability distribution 
as a separate strategy and look for (approximate) pure equi- 
libria in the resulting game (the utilities of the new game can 
be computed via dynamic programming). The challenge 
is to prove that any mixed Nash equilibrium of the origi- 
nal game has to be close to some approximate pure Nash 
equilibrium of the resulting game. For general games this 
is not very hard to see (even though it had apparently es- 
caped the attention of the researchers who first suggested 
the discretization method [29 1), and this observation yields 

a N v g ' ' quasi-PTAS for computing Nash equilibria 
in games in which all players have a fixed number of strate- 
gies, where N is the size of the input (Theorem 14.21 note 
that this complements the N°( l ° s N/e2 ) quasi-PTAS of E2 
for games with a fixed number of players). We also point 
out that the discretization method gives the first algorithm 
for tree-like graphical games with a fixed number of strate- 
gies (for trees, an initial attempt by [ 30 1 in the two-strategy 
case was found to have flaws in 11231 . while in the latter pa- 
per a polynomial-time algorithm for graphical games with 
two strategies on paths and cycles was developed). Our al- 
gorithm applies to all graphical games with a fixed number 
of strategies whose graph is of bounded degree and loga- 
rithmically bounded treewidth. 

The discretization method requires polynomial time in 
the case of anonymous games, because in this case the 
search space is no longer the set of all rt-tuples of discrete 
distributions, where n is the number of players (this is ex- 
ponential in n); instead, via dynamic programming (see the 
proof of Theorem 12.2b . it can be reduced to the set of all 
the ordered partitions of n into £ = 0((z + parts, 
where £ is the number of discrete probability distributions 
defined above, which is polynomial in n, if k, the number 
of strategies, and z, the discretization, are fixed. 

But proving in this case that the approximation is valid 
turns out to be a deep problem. One has to establish a proba- 



bilistic lemma stating that, given a multinomial-sum distri- 
bution (the sum of fc-dimensional unit vector-valued inde- 
pendent but not necessarily identically distributed random 
variables), the probabilities can be rounded to multiples of 
- so that the variational distance between the resulting dis- 
tribution and the original one depends only on z (and in fact 
this dependence is inversely polynomial), and on the dimen- 
sion k (in an arbitrary way; the bound we can prove is expo- 
nential, and we suspect it is necessary). This probabilistic 
lemma for the case of two strategies (i.e., for binomial-sum 
distributions) was proved in [ 16 1 by clustering the variables 
into three classes, depending on how large their expecta- 
tion is, and then using results from the probability litera- 
ture [5] |37) to approximate each component binomial- 
sum distribution (both the original and the rounded one) by 
Poisson or shifted Poisson distributions (depending on the 
cluster), and finally rounding the probabilities so that the 
approximations are close. 

In the multinomial case, however, no useful approxima- 
tions are known; see, e.g., (2 j for some obstacles in extend- 
ing the existent methods to the multinomial case. Another 
reason that makes the binomial case easy is that it is essen- 
tially one-dimensional: in the multinomial case on the other 
hand, watching the balls in one bin, so to speak, provides 
small information about the distribution of the remaining 
balls in the other bins, because the random vectors are not 
identically distributed. Our proof is very involved and indi- 
rect, resorting to an alternative sampling of each random 
vector by funneling a ball down a probabilistic decision 
tree with k — 1 leaves (k is the dimension, or number of 
strategies), ending up eventually with a binary choice at the 
leaves. This choice can now be discretized similarly to the 
binomial case — albeit with much more effort. The decision 
tree topologies become the clusters for the approximation, 
and their number (exponential in k) appears in the variation 
distance via a union bound, and, hence, in the exponent of 
the running time. We believe that this probabilistic lemma 
(Theorem l2.U . and its proof, represent an advance of some 
substance in the state of the art in this area of applied prob- 
ability. 

Our result can be interpreted as constructing a surpris- 
ingly sparse cover of the set of multinomial-sum distribu- 
tions under the total variation distance. Covers of metric 
spaces have been considered in the literature of approxima- 
tion algorithms, but we know of no non-trivial result work- 
ing for the total variation distance or producing a cover 
of the required sparsity to achieve a polynomial-time ap- 
proximation scheme for the Nash equilibrium in anony- 
mous games. To show the value of our result in another 
context, we exhibit a family of non-convex optimization 
problems arising in economics that can be approximated by 
means of our probabilistic lemma and for which no effi- 
cient algorithm was known before. An application of our 



result for this family of non-convex optimization problems 
is a PTAS for finding threat points in repeated anonymous 
games. These results are discussed in Section[5j 

In the balance of this section we provide the necessary 
definitions. In the next section we describe the basics of 
the main result, including the algorithm and an overview of 
the proof. The main part of the proof of the probabilistic 
lemma is in Section[3j while in Section|4]we explore the ap- 
plication of our method to broad generalizations of anony- 
mous games, as well as general (non-anonymous) games 
and graphical games. In Section [5] we present the applica- 
tion of our result to certain types of non-convex optimiza- 
tion problems. We conclude with a discussion of problems 
that remain open. 

1.1 Definitions and Notation 

An anonymous game is a triple G = (n, fc, {u P }) where 
[n] = {1, ...,n}, n > 2, is the set of players, [k] = 
{1, . . . , k}, k > 2, is the set of strategies, and u p with 
p G [n] and i G [k] is the utility of player p when she plays 
strategy i, a function mapping the set of partitions n„_ 1 = 

{(xi,...,Xfc) : Xi G N for alii <E [Q,Yli=i x i =n-l} 
to the interval [0, 1]. @ Our working assumptions are that 
n is large and k is fixed; notice that, in this case, anony- 
mous games are succinctly representable [[35) , in the sense 
that their representation requires specifying 0(n k ) num- 
bers, as opposed to the nk n numbers required for general 
games (arguably, succinct games are the only multiplayer 
games that are computationally meaningful, see (35 1 for 
an extensive discussion of this point). The convex hull of 
the set n*_ 1 will be denoted by A„_ x = {(x%, . . . , Xk) ■ 
Xi > for all i G [k], Xa=i Xi = n — 1}. 

A pure strategy profile in such a game is a mapping 
S from [n] to [k]. A pure strategy profile S is an e- 
approximate pure Nash equilibrium, where e > 0, if, for 
allpG [n], u p {p) (x[S,p}) + e > u P (x[S,p\) for aU z G [k], 

where x[S, p) G Et^._ 1 is the partition [x\ , . . . , Xk) such that 
Xi is the number of players q G [n] — {p} with S(q) = i. 

A mixed strategy profile is a set of n distributions {6 P G 
A fc } pe [„], where by A k we denote the (k — 1) -dimensional 
simplex, or, equivalently, the set of distributions over [k]. 
A mixed strategy profile is an e-Nash equilibrium if, for all 
p g [n] and j, / G [k], 

E$ lt ... t g n u P (x) > E Su ...,5 n u p jt {x) + e ^ S p {j') = 0, 

where x is drawn from by drawing n — 1 random 

samples from [k] independently according to the distribu- 
tions 5 q ,q 7^ p, and forming the induced partition. 

Similarly, a mixed strategy profile is an e-approximate 
Nash equilibrium if, for all p G [n] and j G [k], 



Es 1 ....,s rl u p (x)+e > Eg lt . ,. t g n u p (x), where i is drawn from 
[k] according to S p and x is drawn from n„_j as above, 
by drawing n — 1 random samples from [k] independently 
according to the distributions 6 q ,q ^ p, and forming the 
induced partition. 

Clearly, an e-Nash equilibrium is also an e-approximate 
Nash equilibrium, but the converse is not true in general 
(for an extensive discussion, see lfl9l ). All our positive ap- 
proximation results are for the stronger notion of the e-Nash 
equilibrium. 

2 The Main Result 

The total variation distance between two distributions P 
and Q over a finite set A is 

■ . 1 X ^ 



2 



|P(a)-Q(a)|. 



Similarly, if X and Y are two random variables ranging 
over a finite set, their total variation distance, denoted 

II^-^IItv, 

is defined as the total variation distance between their dis- 
tributions. The bulk of the paper is dedicated to proving the 
following result, generalizing the one-dimensional (fc = 2) 
case established in 1 16 1. 

Theorem 2.1 Let {p t G A k } ie [ n ], and let {Xi G M fc } ie [„] 
be a set of independent k-dimensional random unit vectors 
such that, for all i G [n], I G [A;], Pr[<Yj = et] = Pi.e, 
where ei G M fc is the unit vector along dimension £; also, 
let z > be an integer. Then there exists another set of 
probability vectors {pi G A fc } iS [„] such that 

1- \Pi,t~Pi,t\ = O (I), for alii G [n],£ G [fc]; 

2. pi e is an integer multiple of ^ —, for all i G [n] , £ G 

[k\; 



3. if Pi/ = 0, then ptj — 0,/or all i G [n],l£ [fc]; 

dent ra. 
Pi,t, for all 



4. if {Xi G K fe }ig[ rl ] is a set of independent random 



unit vectors such that Pr[A^ 
i G [n],£ e [fc], then 



.Xi 



and, moreover, for all j G [n], 



(1) 



(2) 



2 In the literature on Nash approximation, utilities ai'e usually normal- 
ized in this way so that the approximation error is additive. 



where /(fc) is an exponential function of k estimated 
in the proof. 



In other words, there is a way to quantize any set of n in- 
dependent random vectors into another set of n indepen- 
dent random vectors, whose probabilities of obtaining each 
value are integer multiples of e £ [0, 1], so that the total 
variation distance between the distribution of the sum of 
the vectors before and after the quantization is bounded by 
0(/(fc)2 fc / 6 e 1 / 6 ). The important, and perhaps surprising, 
aspect of this bound is the lack of dependence on the num- 
ber n of random vectors. From this, the main result of this 
section follows. 

Theorem 2.2 There is a PTAS for the mixed Nash equilib- 
rium problem for anonymous games with a constant number 
of strategies. 

Proof: Consider a mixed Nash equilibrium (pi, . . . ,p n ). 
We claim that the mixed strategy profile (pj, . . . ,p n ) speci- 
fied by TTieorem l2~Tl constitutes a O (f(k)z~ -Nash equi- 
librium. Indeed, for every player i £ [n] and every pure 
strategy m £ [k] for that player, let us track down the 
change in the expected utility of the player for playing 
strategy m when the distribution over defined by 

the {pj}j^a is replaced by the distribution defined by the 
{pj}j^i- It is not hard to see that the absolute change is 
bounded by the total variation distance between the distribu- 
tions of the random vectors anc ^ Tlj^i where 
are independent random vectors distributed ac- 
cording to the distributions {pj}j^i and, similarly, {Xj}jjH 
are independent random vectors distributed according to the 
distributions {pj}j^i-EI Hence, by Theorem l2.ll the change 
in the utility of the player is at most 0(/(fc)z _ s), which 
implies that the %'s constitute an 0(/(fc)z _ s )-Nash equi- 
librium of the game. If we take z = (/(fc)/e)°, this is a 
5-Nash equilibrium, for S = 0(e). 

From the previous discussion it follows that there ex- 
ists a mixed strategy profile {f>i}i which is of the very spe- 
cial kind described by Property [2] in the statement of The- 
orem l2.ll and constitutes a <5-Nash equilibrium of the given 
game, if we choose z = (f(k)/e) 6 . The problem is, of 
course, that we do not know such a mixed strategy profile 
and, moreover, we cannot afford to do exhaustive search 
over all mixed strategy profiles satisfying Property [2] since 
there is an exponential number of those. We do instead the 
following search which is guaranteed to find a (5-Nash equi- 
librium. 

Notice that there are at most (2 k z) k = 2 fe2 (f(k) /e) 6k = : 
K "quantized" mixed strategies with each probability be- 
ing a multiple of z — (f(k)/e) 6 . Let fC be the set 
of such quantized mixed strategies. We start our algorithm 
by guessing the partition of the number n of players into 
quantized mixed strategies; let 8 = {9a-}a- e ic be the parti- 
tion, where a represents the number of players choosing 

3 To establish this bound we use the fact that all utilities lie in [0, 1]. 



the discretized mixed strategy a £ K. Now we only need 
to determine if there exists an assignment of mixed strate- 
gies to the players in [n], with 6 a of them playing mixed 
strategy a £ JC, so that the corresponding mixed strategy 
profile is a J-Nash equilibrium. To answer this question 
it is enough to solve the following max-flow problem. Let 
us consider the bipartite graph ([n], fC, E) with edge set E 
defined as follows: (i,a) £ E, for i £ [n] and a £ fC, 
if a > and a is a 5-best response for player i, if the 
partition of the other players into the mixed strategies in 
hC is the partition 6, with one unit subtracted from 9 a . Q 
Note that to define E expected payoff computations are re- 
quired. By straightforward dynamic programming, the ex- 
pected utility of player i for playing pure strategy s £ [k] 
given the mixed strategies of the other players can be com- 
puted with 0(kn k ) operations on numbers with at most 
b(n,z,k) := \l+n(k+log 2 z)+log 2 (l/u m i n )] bits, where 
M m ; n is the smallest non-zero payoff value of the game.0 To 
conclude the construction of the max-flow instance we add 
a source node u connected to all the left hand side nodes and 
a sink node v connected to all the right hand side nodes. We 
set the capacity of the edge (a, v) equal to 6 a , for all a £ K, 
and the capacity of all other edges equal to 1. If the max- 
flow from u to v has value n then there is a way to assign 
discretized mixed strategies to the players so that 9 a of them 
play mixed strategy a £ JC and the resulting mixed strategy 
profile is a <5-Nash equilibrium (details omitted). There are 
at most (n+l) K ~ 1 possible guesses for6>; hence, the search 
takes overall time 

O ({nKk 2 n k b{n, z, k) + p(n + K + 2)) • (n + l)^" 1 ) , 

where p(n + K + 2) is the time needed to find an inte- 
gral maximum flow in a graph with n + K + 2 nodes and 
edge-weights encoded with at most [log 2 n~\ bits. Hence, 
the overall time is 

o(2 k ' 2 (ilhl) 6k ) 
n V V e } j . log 2 (l/u mi „). 



Remark: Theorem l2.1l can be interpreted as constructing a 
sparse cover of the set of distributions of sums of indepen- 
dent random unit vectors under the total variation distance. 
We know of no non-trivial results working for this distance 
or achieving the same sparsity. 

4 For our discussion, a mixed strategy a of player i is a S-best response 
to a set of mixed strategies for the other players iff the expected payoff of 
player i for playing any pure strategy s in the support of <r is no more than 
S worse than her expected payoff for playing any pure strategy s'. 

5 To compute a bound on the number of bits required for the expected 
utility computations, note that the expected utility is positive, cannot ex- 
ceed 1, and its smallest possible non-zero value is at least (^fc--) n « m in. 
since the mixed strategies of all players are from the set K. 



2.1 Discussion of Proof Techniques 

Observe that, from a technical perspective, the k > 2 
case of Theorem |2.1| is inherently different than the k = 2 
case, which was shown in lfl6ll (Theorem 3.1). Indeed, 
when k = 2, knowledge of the number of players who 
selected their first strategy determines the whole partition 
of the number of players into strategies; therefore, in this 
case the probabilistic experiment is in some sense one- 
dimensional. On the other hand, when k > 2, knowledge 
of the number of "balls in a bin", that is the number of 
players who selected a particular strategy, does not provide 
full information about the number of balls in the other bins. 
This complication would be quite benign if the vectors Xi 
were identically distributed, since in this case the number 
of balls in a bin would at least characterize precisely the 
probability distribution of the number of balls in the other 
bins (as a multinomial distribution with one bin less and the 
bin-probabilities appropriately renormalized). But, in our 
case, the vectors Xi are not identically distributed. Hence, 
already for k — 3 the problem is fundamentally more in- 
volved than in the k — 2 case. 

Indeed, it turns out that obtaining the result for the k = 2 
case is easier. Here is the intuition: If the expectation of 
every Xi at the first bin was small, their sum would be dis- 
tributed like a Poisson distribution (marginally at that bin); 
if the expectation of every Xi was large, the sum would be 
distributed like a (discretized) Normal distribution.il So, to 
establish the result we can do the following (see [16| for 
details): First, we cluster the X^s into those with small and 
those with large expectation at the first bin, and then we 
discretize the Afj's separately in the two clusters in such a 
way that the sum of their expectations (within each clus- 
ter) is preserved to within the discretization accuracy. To 
show the closeness in total variation distance between the 
sum of the Afj's before and after the discretization, we com- 
pare instead the Poisson or Normal distributions (depend- 
ing on the cluster) which approximate the sum of the Xi's: 
For the "small cluster", we compare the Poisson distribu- 
tions approximating the sum of the Xi's before and after the 
discretization. For the "large cluster", we compare the Nor- 
mals approximating the sum of the Xi's before and after the 
discretization. 

One would imagine that a similar technique, i.e., approx- 
imating by a multidimensional Poisson or Normal distribu- 
tion, would work for the k > 2 case. Comparing a sum of 
multinomial random variables to a multidimensional Pois- 
son or Normal distribution is a little harder in many dimen- 
sions (see the discussion in [2]), but almost optimal bounds 

6 Comparing, in terms of variational distance, a sum of independent 
Bernoulli random variables to a Poisson or a Normal distribution is an 
important problem in probability theory. The approximations we use are 
obtained by applications of Stein's method I3ll4l l37l . 



are known for both the multidimensional Poisson J2] [38] 
and the multidimensional Normal J6] [26] approximations. 
Nevertheless, these results by themselves are not sufficient 
for our setting: Approximating by a multidimensional Nor- 
mal performs very poorly at the coordinates where the vec- 
tors have small expectations, and approximating by a multi- 
dimensional Poisson fails at the coordinates where the vec- 
tors have large expectations. And in our case, it could very 
well be that the sum of the Xi's is distributed like a mul- 
tidimensional Poisson distribution in a subset of the coor- 
dinates and like a multidimensional Normal in the comple- 
ment (those coordinates where the Xi's have respectively 
small or large expectations). What we really need, instead, 
is a multidimensional approximation result that combines 
the multidimensional Poisson and Normal approximations 
in the same picture; and such a result is not known. 

Our approach instead is very indirect. We define an al- 
ternative way of sampling the vectors Xi which consists of 
performing a random walk on a binary decision tree and 
performing a probabilistic choice between two strategies 
at the leaves of the tree (Sections 13.11 and 13. 2\ . The ran- 
dom vectors are then clustered so that, within a cluster, all 
vectors share the same decision tree (Section 13.3b . and the 
rounding, performed separately for every cluster, consists 
of discretizing the probabilities for the probabilistic experi- 
ments at the leaves of the tree (Section [3.4t . The rounding 
is done in such a way that, if all vectors Xi were to end 
up at the same leaf after walking on the decision tree, then 
the one-dimensional result described above would apply for 
the (binary) probabilistic choice that the vectors are facing 
at the leaf. However, the random walks will not all end up 
at the same leaf with high probability. To remedy this, we 
define a coupling between the random walks of the original 
and the discretized vectors for which, in the typical case, 
the probabilistic experiments that the original vectors will 
run at every leaf of the tree are very "similar" to the experi- 
ments that the discretized vectors will run. That is, our cou- 
pling guarantees that, with high probability over the random 
walks, the total variation distance between the choices (as 
random variables) that are to be made by the original vec- 
tors at every leaf of the decision tree and the choices (again 
as random variables) that are to be made by the discretized 
vectors is very small. The coupling of the random walks is 
defined in Section [331 and a quantification of the similar- 
ity of the leaf experiments under this coupling is given in 
Section E6l 

For a discussion about why naive approaches such as 
rounding to the closest discrete distribution or randomized 
rounding do not appear useful, even for the k = 2 case, see 
Section 3.1 of iTTBl 



3 Proof of Theorem O 

3.1 The Trickle-down Process 

Consider the mixed strategy pi of player i. The crux 
of our argument is an alternative way to sample from this 
distribution, based on the so-called trickle-down process, 
defined next. 

TDP — Trickle-Down Process 

Input: (S,p), where S — {ii, . . . ,i m } C [k] is a set of 
strategies and p a probability distribution p(ij) > : j = 
1, . . . , m. We assume that the elements of S are ordered 
i±,...,i m in such a way that (a) p(i2) is the largest of the 
p(ijYs and (b) for 2 ^ j < f ^ 2, p(i 3 ) < p(i r ). That 
is, the largest probability is second, and, other than that, the 
probabilities are sorted in non-decreasing order (ties broken 
lexicographically). 

if \S\ < 1 stop; 

else apply the partition and double operation: 

1 . let £* < m be the (unique) index such that 

Y.KfPiu) < \ and Y,i>i*p{u) < \\ 

2. Define the sets 

S L = {i e :£< £*} and S R = {ii : I > £*} 

3. Define the probability distribution pl such that, for 
all £ < £*, p L {i e ) = 2p(i e ). Also, let t := 1 - 
Yfi^ 1 Pl{h ); if t = 0, then remove from Sl, oth- 
erwise set pl (it* ) = £• Similarly, define the probabil- 
ity distribution pr such that p R {ig) = 2p(ig), for all 
£> £* mdp R (i e ,) = 1 - zZT'+iPnM- Notice that, 
because of the way we have ordered the strategies in 
5", it* is neither the first nor the last element of S in 
our ordering, and hence 2 < \Sl\, \S r \ < \S\. 

4. callTDP(5 L ,p L ); call TDF(S R ,p R ); 

That is, TDP splits the support of the mixed strategy of 
a player into a tree of finer and finer sets of strategies, with 
all leaves having just two strategies. At each level the two 
sets in which the set of strategies is split overlap in at most 
one strategy (whose probability mass is divided between its 
two copies). The two sets then have probabilities adding up 
to 1/2, but then the probabilities are multiplied by 2, so that 
each node of the tree represents a distribution. 

3.2 The Alternative Sampling of Xi 

Let pibe the mixed strategy of player i, and Si be its 
support. [] The execution of TDP(Si,pi) defines a rooted 

7 In this section and the following two sections we assume that |<Si| > 
1; if not, we set pi = Pi, and all claims we make in Sections [3.5 1 and [X6l 
are trivially satisfied. 



binary tree Tj with node set Vi and set of leaves dTi. Each 
node v € V% is identified with a pair (S v ,pi. v ), where S v C 
[k] is a set of strategies and pi :V is a distribution over S v . 
Based on this tree, we define the following alternative way 
to sample Xf. 

Sampling Xi 

1. (Stage 1) Perform a random walk from the root of the 
tree Ti to the leaves, where, at every non-leaf node, the 
left or right child is chosen with probability 1/2; let 
$i E dTi be the (random) leaf chosen by the random 
walk; 

2. (Stage 2) Let (S, p) be the label assigned to the leaf 
<l>i, where S = {£\ ,£2}', set Xi = ee 1 , with probability 
p(£i), and Xi — e^ 2 , with probability p(£-2)- 

The following lemma, whose straightforward proof we 
omit, states that this is indeed an alternative sampling of the 
mixed strategy of player i. 

Lemma 3.1 For all i € [n], the process SAMPLING Xi out- 
puts Xi = eg with probability Pi t tfor all £ € [k]. 

3.3 Clustering the Random Vectors 

We use the process TDP to cluster the random vectors of 
the set {Xi] i£ ^. We define a cell for every possible tree 
structure. In particular, for some a > to be determined 
later in the proof, 

Definition 3.2 (Cell Definition) Two vectors Xi and Xj 
belong to the same cell if 

• there exists a tree isomorphism fi_j : Vi — > Vj between 
the trees Ti and Tj such that, for all u S Vi, v £ Vj, if 
fi,j(u) = v, then S u = S v , and in fact the elements of 
S u and S v are ordered the same way by pi tU andpj tV . 

• ifu£ dTi, v = fi.j(u) 6 dTj, and £* G S u = S v is 
the strategy with the smallest probability mass for both 

I z a I 

Pi^ u and Pj. v , then either pi^ u (£*),pj, v (£*) < or 

Pi,u(£*),Pj,v(£*) > ^r~ ; tne l ea f is called Type A 
leaf in the first case, Type B leaf in the second case. 

It is easy to see that the total number of cells is bounded 
by a function of k only, estimated in the following claim; 
the proof of the claim is postponed to Appendix lAl 

Claim 3.3 Any tree resulting from TDP has at most k — 1 
leaves, and the total number of cells is bounded by g(k) = 
k k *2 k ~ l 2 k k\. 



3.4 Discretization within a Cell 



3.5 Coupling within a Cell 



Recall that our goal is to "discretize" the probabilities in 
the distribution of the Xi's. We will do this separately in 
every cell of our clustering. In particular, supposing that 
is the set of vectors falling in a particular cell, for 
some index set X, we will define a set of "discretized" vec- 
tors {Aijigx in such a way that, for h(k) = k2 , and for all 

jel, ' 



E Xi i ~ E Xi 



iei 



iei 



0{h(k) log 



z ■ z 



-1/5 



); (3) 



E ^ 

iez\{j} 



E £ 



0{h{k) logz • z- 1/5 ). 



TV 



(4) 



We establish these bounds in Section l331 Using the bound 
on the number of cells in Claim [331 an easy application of 
the coupling lemma implies the bounds shown in (fTJ and (O 
for f(k) :— h(k) ■ g(k), thus concluding the proof of The- 
orem [XT] 

We shall henceforth concentrate on a particular cell con- 
taining the vectors {Xi}i e z, for some X C [n]. Since the 
trees {Ti}i e j are isomorphic, for notational convenience 
we shall denote all those trees by T. To define the vec- 
tors {Xi]. Le x we must provide, for all iei, a distribution 
% : [k] -> [0, 1] such that Pr[A; = e e ] = %{£), for all 
I £ [k]. To do this, we assign to all {Xi}i^i the tree T and 
then, for every leaf v G dT and i 6l, define a distribution 
Pi :V over the two-element ordered set S v , by the ROUND- 
ING process below. Then the distribution pi is implicitly 



defined as pi(£) = J2 V 



2- depthT ^p iiV (£). 



Rounding: for all v e dTwithS',, = {£i,£ 2 }Ji,h € [k] 
do the following 

1 . find a set of probabilities \jPi t i x }iei with the following 
properties 

• foralHel, \pi^ -Pi,v(ti)\ < \\ 

• for all iei, pi t e 1 is an integer multiple of i; 

2. forallz el,setpi, v (£i) ■= Pi,ti,Pi,v{h) ■= l-Piji', 

Finding the set of probabilities required by Step[T]of the 
Rounding process is straightforward and the details are 
omitted (see |fl6l . Section 3.3 for a way to do so). It is 
now easy to check that the set of probability vectors {%} j £ x 
satisfies Properties [T]|2] and [3] of Theorem l2.ll 



We are now coming to the main part of the proof: Show- 
ing that the variational distance between the original and the 
discretized distribution within a cell depends only on z and 
k. We will only argue that our discretization satisfies (O; 
the proof of (O is identical. 

Before proceeding let us introduce some notation. 
Specifically, 

• let <i>i € dT be the leaf chosen by Stage 1 of the pro- 
cess Sampling Xi ancl § % e dT the leaf chosen by 
Stage 1 of Sampling Xi\ 

• let <f> = {$i) ieX and let G denote the distribution of 

similarly, let $ = ($i)i e i and let G denote the 
distribution of $. 

Moreover, for all v £ dT, with S v — {£1,^2} and ordering 

• let X v C X be the (random) index set such that i £ X v 
iff i £ X A $i = v and, similarly, let X v C X be the 
(random) index set such that i £ X v iff i £ X A $i = v; 

• let Jv t i, Jv,2 Q X v be the (random) index sets such 
i £ Jv,i iff i £ X v f\ Xi = e£ 1 and i £ J Vt2 iff 
i £ X v A Xi = ei 2 ; 

• let T vA = \J v ,i\, T Vt2 = \JvM and let F v denote the 
distribution of T„ i; 

• let T := ({T Vt i,T Vt 2)) v ear and let F denote the dis- 
tribution of T; 

• letXa, Jv,2, T v ,i, f v> 2, T, F v , F be defined similarly. 

The following is easy to see; we postpone its proof to 
the appendix. 

Claim 3.4 For all 9 £ (dT) 1 , G{9) = G{6). 

Since G and G are the same distribution we will hence- 
forth denote that distribution by G. The following lemma is 
sufficient to conclude the proof of Theorem l2.ll 

Lemma 3.5 There exists a value of a, used in the definition 
of the cells, such that, for all v £ dT, 



G 



\F V ('\* = B)-F V ('\* 



Itv < O 
> 1 



2" logz 

z l/5 



4 



where F v (-\<&) denotes the conditional probability distribu- 
tion ofT v i given $ and, similarly, F v (•1$) denotes the con- 
ditional probability distribution ofT v> i given $. 



Lemma 13.51 states roughly that, for all v 6 dT, with 
probability at least 1 — -fj^ over the choices made by 
Stage 1 of processes {SAMPLING <Yj} igz and {SAMPLING 
Xi}iex — assuming that these processes are coupled to 
make the same decisions in Stage 1 — the total variation 
distance between the conditional distribution of T v i and 



T v i is bounded by O 



2" log 



The following lemma, 



whose proof is provided in the appendix, concludes the 
proof of the main theorem. 



Lemma 3.6 (H) implies 



\F-F\\tv <0 k 



2 k log z 

A/5 



(6) 



Note that © easily implies (0I 

3.6 Proof of Lemma [3751 

To conclude the proof of Theorem l2.ll it remains to show 
Lemma l331 Roughly speaking, the proof consists of show- 
ing that, with high probability over the random walks per- 
formed in Stage 1 of SAMPLING, the one-dimensional ex- 
periment occurring at a particular leaf v of the tree is simi- 
lar in both the original and the discretized distribution. The 
similarity is quantified by Lemmas l3.10l andl 3.1 ll for leaves 
of type A and B respectively. Then, Lemmas 13.71 13.81 and 
I3.9l establish that, if the experiments are sufficiently similar, 
they can be coupled so that their outcomes agree with high 
probability. 

More precisely, let v £ dT, S v = {(1,12}, and suppose 
the ordering (l\,£i). Also, let us denote £* = £1 and define 
the following functions 

Note that the random variable represents the total 

probability mass that is placed on the strategy £* after the 
Stage 1 of the SAMPLING process is completed for all vec- 
tors Xi, i € I. Conditioned on the outcome of Stage 1 of 
Sampling for the vectors {Xi]i^x, fi>v(&) is the expected 
number of the vectors from X v that will select strategy t* v in 
Stage 2 of SAMPLING. Similarly, conditioned on the out- 
come of Stage 1 of Sampling for the vectors {Ai}j 6 i, 
fi v (&) is the expected number of the vectors from T v that 
will select strategy 1% in Stage 2 of SAMPLING. 

Intuitively, if we can couple the choices made by the ran- 
dom vectors Xi, i G 1, in Stage 1 of SAMPLING with the 
choices made by the random vectors Xi, i E I, in Stage 1 of 
Sampling in such a way that, with overwhelming proba- 
bility, H v (&) and /2„(3>) are close, then also the conditional 



distributions F„(-|<I>), F v (-\$>) should be close in total vari- 
ation distance. The goal of this section is to make this intu- 
ition rigorous. We do this in 2 steps by showing the follow- 
ing. 

1 . The choices made in Stage 1 of S AMPLING can becou- 
pled so that the absolute difference \ fx v ($) — Jl v ($) | is 
small with high probability. (Lemmas 13 . 1 01 and 13.111 ) 

2. If the absolute difference \/j, v (9) — 'p, v (9)\ is sufficiently 
small, then so is the total variation distance | |F„(-|4> = 
9) - F v (-\$ = 6)\\ TV . (Lemmas E7] EH andl331) 

We start with Step [2] of the above program. We use dif- 
ferent arguments depending on whether v is a Type A or 
Type B leaf. Let dT = La ULg, where La is the set of 
type A leaves of the cell and Lb the set of type B leaves of 
the cell. For some constant (3 to be decided later, we show 
the following lemmas. 

Lemma 3.7 For some 9 e (dT) 1 and v G La suppose that 
\fx v (d) - < z {a - 1)/2 V£[Mmogz (7) 

%(fi) - £ \Mv&)] < >- 1)/2 VSSlogz (8) 



then 



\f v {-\§ = e) - f v (-\$ = 



I TV 



< o 



z (l-a)/2 



Lemma 3.8 For some 9 G (dT) 1 andv e L B suppose that 



n v (9) := \{i 

M0)-/*»(0)| < 



= v}\>z p , 

1 

z 



(9) 
(10) 



\n v (9) - 2- iepthT ^\I\\ < ^/3^^2- de P th ^ v )\l\; 

(ID 



then 
\\Fv('\* 



F v (-\$ 



\TV 



< O 



/ depth T (v) » 

2 2 y/\pg Z 

0(z~ a ) + 0(z-^ 



O 



' depth T (v) \ 
2 2 log Z 



Lemma 3.9 For some 9 e (dT) 1 andv e L B suppose that 
n v (9) := \{i : 9 t = v}\ < z? (12) 

then 

\\F v {-\* = 9)- F v (-\$ = 9)\\ TV < 0(z-^-^). 



The proof of Lemma |3.9| follows from a coupling argument 
similar to that used in the proof of Lemma 3.13 in |[T6l 
and is omitted. The proofs of Lemmas 13.71 and 13.81 can 
be found respectively in Sections IBlandlClof the appendix. 
Lemma [3~7l provides conditions which, if satisfied by some 
at a leaf of Type A, then the conditional distributions 
= 9) and F v {-\& = 9) are close in total variation 
distance. Similarly, Lemmas [3.81 and [3~!9l provide conditions 
for the leaves of Type B. The following lemmas state that 
these conditions are satisfied with high probability. Their 
proof is given in Section|D]of the appendix. 

Lemma 3.10 Let v e La- Then 



Theorem 4.1 There is a PTAS for semi-anonymous games 
with a fixed number of strategies. 



Further generalizations (for example, not bounding the size 
of the extended families) lead to PPAD-complete problems. 

The discretization approach for the Nash equilibrium 
that we employed so far in this paper to anonymous games 
with a fixed number of strategies has surprisingly broad ap- 
plicability, for example yielding a quasi-PTAS for general 
games (proof in Appendix [Ab: 



G 



-r— 72 yj£ [/}«($)] 

> 1 -4z~ 1/3 . (13) 



Lemma 3.11 Let v £ Lb- Then 



G (e:\»M-M0)\< l+ ^T* z a 

V K(0) - 2~ de P thT ^\l\\ < ^3 log 2^/2-^*^^)1X1, 

> 1 - ^2 • 04) 

Setting a = | and = |, combining the above, and 
using that depthr{v) < k, as implied by Claim [331 we 
get (O, regardless of whether v £ La or v £ Lb- 



Theorem 4.2 In any normal-form game with a constant 
number of strategies per player, an e-approximate Nash 
equilibrium can be computed in time N°^° s ~^\ where 
N is the description size of the game. 



By combining the discretization approach with the tech- 
niques of [22 1 we can find approximate Nash equilibria in 
a large class of graphical games. It had long been thought 
that graphical games on trees with two strategies per player 
can be solved in polynomial time [30], until subtle flaws 
in the algorithm were discovered ll23l . The largest class of 
graphical games that are known to have a polynomial-time 
algorithm for Nash equilibria is graphical games on a cy- 
cle and two strategies per player ll23l . The following result 
treats a far broader class of games, albeit approximately; its 
proof is omitted. 



4 Extensions 

Returning to our algorithm (Theorem l2.2b . there are sev- 
eral directions in which it can be immediately generalized. 
To give an idea of the possibilities, let us define a semi- 
anonymous game to be a game in which 

• the players are partitioned into a fixed number of types; 

• there is another partition of the players into an arbitrary 
number of disjoint graphical games (see [29], games in 
which a node's utility depends only on its neighboring 
nodes) of size 0(log n), where n is the total number of 
players, and bounded degree called extended families; 

and the utility of each player depends on (a) his/her own 
strategy; (b) the overall number of other players of each 
type playing each strategy; and (c) it also depends, in an ar- 
bitrary way, on the strategy choices of neighboring nodes in 
his/her own extended family. The following result, which is 
only indicative of the applicability of our approach, can be 
shown by extending the discretization method via dynamic 
programming (details omitted): 



Theorem 4.3 There is PTAS for computing Nash equilib- 
ria in graphical games in which each player has a num- 
ber of strategies bounded by a constant and the graph has 
bounded degree and 0(log n) treewidth. 



5 An Application to Optimization 



We illustrate an interesting application of our method in 
non-convex optimization. This application relates nicely to 
the interpretation of our main result (Theorem 12. It as con- 
structing a sparse cover of the set of distributions of sums of 
independent unit vectors under the total variation distance. 
The minimax optimization problem that we present arises 
in the context of solving repeated anonymous games, us- 
ing the folk theorem [9|, and similar optimization problems 
arise naturally in economics whenever secure strategies or 
threats are being computed. The optimization problem that 
we consider is the following. 



Given functions fx, f% '■ {0, 1, . . . , n} — ► [0, 1] solve 
the optimization problem 



min max < Sxi^BfpA 



(15) 



where £xi~B(pi) denotes the expectation over the 
joint measure of independent Bernoulli random vari- 
ables Xi,i = 1, ... ,n, with expectations pi,i = 
1, ...,n. 

We know of no efficient algorithm for solving the above 
optimization problem. Nevertheless, our technique gives 
rise to a polynomial time approximation scheme. The idea 
is to use Theorem 12. II to show that restricting the search 
space from [0, l] 71 to {0, e, 2e, . . . , 1}™ results in a loss of 
at most 0(e 1 ^ 6 ) in the value of the optimum. This obser- 
vation is complemented by the symmetry of the objective 
function with respect to the pi's; hence, we can search over 
the discretized space in time (^(n 1 / 6 ) rather than (1/e)™. 
The proof of the following theorem is given in detail in Ap- 
pendixlAl 

Theorem 5.1 There is a PTAS for solving the non-convex 
optimization problem ( I15I ). 

The algorithm extends to the case that the minimax prob- 
lem is replaced by a maximin problem. Moreover, our 
method provides polynomial time approximation schemes 
for several generalizations of ( fT~5T >. e.g., for the case that the 
maximum is taken over more than two functions (the case of 
one function is trivial), the domain of the functions is mul- 
tidimensional (but with a constant number of dimensions), 
the functions have several (but constant number of) argu- 
ments, etc. Theorem l5.1l implies immediately the following 
result. 

Corollary 5.2 (| 9]) There is a PTAS for computing threat 
points in repeated anonymous games with a constant num- 
ber of strategies per player. 

6 Open Problems 

Is there a PTAS for the Nash equilibrium problem? A 
major progress in this direction would be to turn the quasi- 
PTAS we described in the previous section for the case of a 
fixed number of strategies to a true PTAS. This is challeng- 
ing, of course, but not hopeless. The exhaustive algorithm 
need not be completely exhaustive; a more intelligent search 
of the space, possibly in a dynamically varying grid of dis- 
cretized probabilities, could possibly bring improvements 
in the running time. On the other hand, any constant lower 
bound on the approximability would be great progress as 



well; we conjecture that such a bound is possible at least for 
graphical games. 

Obviously, our PTAS is not ready to be implemented and 
run; the exponent makes it unrealistic for any reasonable e. 
(As we have argued in the Introduction, its true significance 
lies in delimiting the implications of the complexity result 
in |fT9l .) There are ways to improve it, perhaps even sub- 
stantially. For example, by a more elaborate trickle-down 
process all trees could be made full binary trees of depth 
log fc, which would remove one of the exponential functions 
from the exponent of the running time. But a truly practical 
algorithm would have to start from a new idea — possibly 
from that of a "less exhaustive search" mentioned in the pre- 
vious paragraph. In fact, an efficient PTAS for the case of 
two strategies has been recently suggested lfl5l . 
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APPENDIX 



A Skipped Proofs 

Proof of Claim |33l That a tree resulting from TDP has k — 1 leaves follows by induction: It is true when k = 2, and for 
general k, the left subtree has j strategies and thus, by induction, j — 1 leaves, and the right subtree has at most k + 1 — j 
strategies and k — j leaves; adding we get the result. 

To estimate the number of cells, let us fix the set of strategies and their ordering at the root of the tree (thus the result of 
the calculation will have to be multiplied by 2 k k\) and then count the number of trees that could be output by TDP. Suppose 
that the root has cardinality m and that the children of the root are assigned sets of sizes j and m + 1 — j (or, in the event of 
no duplication, m — j), respectively. If j = 2, then a duplication has to have happened and, for the ordering of the strategies 
at the left child of the root, there are at most 2 possibilities depending on whether the "divided strategy" is still the largest at 
the left side; similarly, for the right side there are m — 1 possibilities: either the divided strategy is still the largest at the right 
side, or it is not in which case it has to be inserted at the correct place in the ordering and the last strategy of the right side 
must be moved to the second place. If j > 2, similar considerations show that there are at most j — 1 possibilities for the left 
side and 1 possibility for the right side. It follows that the number of trees is bounded from above by the solution T(k) of the 
recurrence 

T(n) = 2 T(2) • (n - l)T(n - 1) 

n-l 

+ l ) T ^) ■ max{T(n - j),T(n + 1 - j)}. 

3=3 

with T(2)=l. It follows that the total number of trees can be upper-bounded by the function k . Taking into account that 
there are 2 k k\ choices for the set of strategies and their ordering at the root of the tree, and that each leaf can be of either 
Type A, or of Type B, it follows that the total number of cells is bounded by g(k) = k k 2 k ~ 1 2 k k\. ■ 

Proof of Claim I3.4t The proof follows by a straightforward coupling argument. Indeed, for all i G I, let us couple the 
choices made by Stage 1 of SAMPLING X t and SAMPLING A$ so that the random leaf $i G dT chosen by SAMPLING 
Xi and the random leaf $j G dT chosen by SAMPLING Xi are equal, that is, for all i G I, in the joint probability space 
Pr[<l>i = = 1; the existence of such a coupling is straightforward since Stage 1 of both SAMPLING Xi and SAMPLING 
Xi is the same random walk on T. ■ 

Proof of Lemma l33l Note first that © implies via a union bound that 

G(e:Vve dT, \\F v (-\* = 6)- F v (-\$ = 9)\\ n < O (^p)) > 1 - O^- 1 / 3 ), (16) 

since, by Claim [331 the number of leaves is at most k — 1. 

Now suppose that, for a given value of 9 G (dT) 1 , the following is satisfied 

Vv G dT, \\F v (-\$ = 6)- i?„(-|<£ = 6)\\tv < O (^j$r) • (17) 

Observe that the variables {T v ,i} v£ qt are conditionally independent given $, and, similarly, the variables {T v ±y v ^gr are 
conditionally independent given $. This, by the coupling lemma, Claim [3~3l and ( fTTI i implies that 

= 9) - F(# - 9)\\rv < O (k^^j , 

where we used that, if $ = <f>, then \I V \ = \l v \, Vw G dT. 
Therefore, ( TT~6b implies 

G U : - 9) - = 9)\\jv < O fk^^Yj > 1 - O^z^ 3 ). (18) 

All that remains is to shift the bound of ( TT~8T > to the unconditional space. The following lemma establishes this reduction. 



Lemma A.l ( fl8] > implies 



\F-F\\ Ty <Olk 



2 k log z 

yl/5 



Proof of Lemma DOl Let us denote by 



2 k log z 



Good = {6\6 e (dTy : = 6)- = 9)\\tv < [ k — 

and let Bad = {dT) 1 - Good. By <QJl, if follows that G(Bad) < 0(k Z -^ 3 ). 



1/5 



(19) 



\T-T ||tv=JEI F (*)-^)I 



t 



5^ = 0)G($ = 6») - ^ = (9)G(I> = 0) 



J2(F(t\<Z> = 9)-F(t\$ = 9))G(9) 
t e 

< JEEM^)-^*^) G ^ 

1 t 9 ' 

= ^EE |f(*|$ = 0)-F(# = 0)|g(0) 
< eeGood 

+ ^E E |W = fl)--F(*l* = fl)|G(fl) 

t OeBad 

^ E G(0)(i^|F(t|$ = e)-F(# = 0)|) 
eeGood \ t I 

+ E G(^)(i^|F(t|$ = ^)-F(# = e)|) 

< E gw.o(*£^) + E <w 

eeGood ^ ' BeBad 



(using G(0) = G(0) ; 



V0 



< O ft 



2 fc log 2 

,1/5 



+ 0(fcz~ 1 / 3 ) 



Proof of Theorem I4.2t Let p be the number of players and s the number of strategies per player which we assume to be a 
constant; the input size is N = ps' p . Consider a new p-player game in which the set of pure strategies of each player is the 
set of all distributions over the s strategies of the original game whose probabilities are integer multiples of S = ^j. We 
claim that, if we search over all the pure strategy profiles of the new game, we are bound to discover an e-approximate Nash 
equilibrium of the original game. To prove this, it suffices to notice the following which is proven below by two applications 
of the coupling lemma. 

Lemma A.2 Let (x\, . . . ,x p ) be a mixed-Nash equilibrium of the original game and [x\, . . . ,X P ) be another set of mixed 
strategies, where, for alii, j, Xi(J) is an integer multiple of 5 — — x i{j)\ < <5 and, ifxi(j) = 0, then also Xi(j)=0. 

Then (xi , . . . , x p ) is an e-approximate Nash equilibrium of the original game. 



The number of pure strategy profiles of the new game that we have to search over is at most ((i) S ) P , which is easily seen to 

Proof of Lemma \A.2l For every player i and strategy j, let {Jj and C/j be the expected utility of player i if she plays j and 
the other players play {xi> }i>^i and {xi> }i'^i respectively. The difference between J7j and ?7j can be bounded as follows 



\u;-u]\<\\( Xl , 



,,x p )- (xi, . . .,X p )\\jv, 



where the right hand side of the above expression represents the total variation distance between the compound distributions 
(xi , . . . , X p ) and {x\ , . . . , x p ), and we used the fact that the payoff functions of the players lie in [0, 1] . We will show that 

...,Xp) - (ii, . . .,Xp)\\tv < ~. 

Indeed, for all i, let Xi be a random s-dimensional vector such that Xi = ej with probability Xi(j), and suppose that the 
vectors {Xi]i are independent. Similarly, define vectors {Xi]i. The coupling lemma implies that, for any coupling of {Xi}i 
and {Xi}i, 

\\(Xi, . . . ,X p ) — {X\, . . . ,X p )\\tv < Pr[(Ai, . . . , X p ) ^ (Xi, . . . ,X p )], 
which, by a union bound, implies 

IK*!, . . . , Xp) - (#1, . . . , Xp)\\Tv < Pr[*i ¥= 

i 

Let us now fix a coupling for which, for all i, 

Pv[Xi^ = WK-KWtv. 

Such a coupling exists by the coupling lemma and the fact that the random vectors {Xi]i are independent and so are the 
random vectors {^ji. Combining the above, we get 



||(Ai, . . . , X p ) — (Xi, . . . ,X p )\\tv < 2_j \\Xi ~ 1 1 TV- 



Observe finally that 
and, for all i, 

It follows that 
Hence, for all i, j, 

which implies that (x\ 



(xi, ...,Xp) - (xi, . . . ,x p )||tv = IK^i, . . . , X p ) - (Xx, . . .,Xp)\\tv, 



Xi - XiWrv = \\xi - xAItv <Ss = — 



2p 

||(xi, ...,X p ) - (xi, . . . ,Xp)||7V < |. 

\Uj-U}\ < ||(x 1 ,...,x p )-(x 1 ,...,x p )|| 7V < |, 
, x p ) is an e-approximate Nash equilibrium of the original game. 



Proof of Theorem l5.lt It is not hard to see that for any sets of probabilities {pi}i and {p'^i, and for any a € {1, 2}, 



Xi~B( Pi ) 



£ 



< 



where, in the right hand side of the above {X;}i is a set of independent Bernoulli random variables with expectations {pi}i 
and {1^}^ a set of independent Bernoulli random variables with expectations {p'i}i. 

Suppose now that {p*}i is the set of probabilities achieving the optimum value for dT5l ). It follows from the above that 
if we perturb the p*'s to another set of probabilities {p'*}i the value of the minmax problem is only affected by an additive 

term | |E< *i ~ Ei *5| Itv> where X * ~ B (P* ) and Y * ~ B (P? )• for a11 % - 



It follows from Theorem l2.1l that, for any set of probabilities {p*}i, there exists another set of e-"discretized" probabilities 
{p'i*}i, that is, p'* is an integer multiple of e, for all i, such that 



E 



< 0(e 1/6 ). 



TV 



Hence, we can restrict the optimization to e-discretized probabilities with an additive loss of 0(e 1 ^ 6 ) in the value of the 
optimum. Even so, the search space is of size f2 ((-) ) which is exponential in the input size 0(n). By observing that the 
objective function is symmetric with respect to the set of probabilities {pi}i we can prune the search space to searching only 
over the partitions of n unlabeled objects into 1/e bins, that is Ofa 1 ^) possible partitions. This results in a polynomial time 
approximation scheme. ■ 

B Proof of Lemma 13.71 

Proof: By the assumption it follows that 



- M0)\ < \e\Mv@)] - | + z (Q-1)/ V£[M*)]iog.z + ^ a - 1)/2 ^|p B (*)]iog 
£[ fr ($)] = 2- d *(^ Pl ,„(f;) 



Moreover, note that 



and, similarly, 



£[Mm = ^ depthT{v) Y.p^{t v ). 

iei 

By the definition of the ROUNDING procedure it follows that 

\£[fi v ($)}-£[fi v ($)]\ < 2 - de p*M*)i. 

z 

Hence it follows that 



(20) 



Let M v (9) :— {i : 9t — v}, n v — \AT V \. Conditioned on $ = 9, the distribution of T Vt i is the sum of n v independent 
Bernoulli random variables {Zi}i e _\f v with expectations £[Zi\ = p% lV (l%) < J=5-J.. Similarly, conditioned on $ = 9, the 
distribution of T Vl i is the sum of n v independent Bernoulli random variables {Zi}i^j\f v with expectations £[Zi] = %, v {(* v ) ^ 
J^J-. Note that 



and, similarly, 



Without loss of generality, let us assume that £[[i v (<!>)] > £[fl v (<t)}. Let us further distinguish two cases for some constant 
t < 1 — a to be decided later 



Casel: £[//„($)] < jr- 



From it follows that, 

H V {6) < £[n v m + 2 ^/ 2 \^Mfc < ^ + z <^S) /2 =■ 9(z). 

Similarly, because £[jl v (&)] < £[/j, v ($)} < fi v (6) < g(z). 

By Markov's inequality, Pr #=e Eiejv, Z l >\]< < g{z) and, similarly, Pr 4=9 E ieJ v; ^ > !] < 5<»- Hence, 



E^ = ° 

ieJV„ 



-Pr 



E 2* = ° 



Pr $=e 



E 



-Pr 



E 



It follows then easily that 



F,.-|.J>: :^,|; /v <4 g (z)=4-(i;+ ,^g /2 



(21) 



Case 2: £[/*„(*)]>£. 

The following claim was proven in [ 16], Lemma 3.9, 
Claim B.l For any set of independent Bernoulli random variables {Zi}i with expectations £[Zi] < 

1 



Z,; — Poisson ^£ ^ 



< 



TV 



By application of this lemma it follows that 



Zi — Poisson(p v (9)) 

i£Af v 

^ — Poisson(p. v (6)) 



< 



7V 



< 



vl— a 



(22) 
(23) 



71' 



We study next the distance between the two Poisson distributions. We use the following lemma whose proof is postponed till 
later in this section. 

Lemma B.2 IfX = A + D for some D > 0, A > 0, 



Ao' 



|| Poisson(X) — Poisson(Ao)||TV < D 
An application of Lemma lB~2l gives 

\\Poisson(fi v (9)) - Poisson{jl v ($)) | tv < \Hv{0) - V-v(Q)\\\ - 

We conclude with the following lemma proved in the end of this section. 

Lemma B.3 From (|7), j8j, ( 120b and the assumption £ [/!„(<!>)] > \, it follows that 



(24) 



\» v (0)-M0)U 



min {fi v (0), % (0)} 



< \ 72 



log z 



Combining (l22l i. d23l >. d24l i and Lemma lBJl we get 



< 2 Uw^of VW~ Z 



TV 



z (l-a)/2 I ' 



which implies 



|^(-|$ = 0) - = ^)||rv < O ( ) . (25) 



Taking r > (1 — a)/2, we get from 1211 . d25l l that in both cases 



\F V (.\$ = 6)- F v (-\* = 0)\\tv<O[ -£§|L j . (26) 



Proof of lemma BT21 We make use of the following lemmas. 

Lemma B.4 IfX, Ao > 0, the Kullback-Leibler divergence between Poisson(Xo) and Poisson(X) is given by 

AKL(Poisson(X)\\Poisson(X )) = A ( 1 ^ + ^- log ^ 

V A A A 

Lemma B.5 (e.g. 1 14|) If P and Q are probability measures on the same measure space and P is absolutely continuous with 
respect to Q then 

\\P-Qhv < y/2A KL (P\\Q). 

By simple calculus we have that 

/A A A \ D 2 
A KL (Poisson(X)\\Poisson(X )) = A [ 1 — + — log — ) < — . 

Then by Lemma lB31 it follows that 



A A ° A J ~ A 



[2 

\\Poisson(X) — Poisson(Xo) \\ tv < D\ — . 

V A 



Proof of lemma |53J From d20b and the assumption £ [//„(<!>)] > we have 



From the assumption £[/!„(<£>)] > it follows 

= ^W^ll^l/'.f*)] (27) 
> -l^V^M- (28) 

Since r < 1 — a, it follows that, for sufficiently large z which only depends on a and r, — > 2 y_*°f f 2 . Hence, 



which together with (Q implies 



Similarly, starting from £[/5„($)] > £[p v ($)} 



> 



i, it can be shown that for sufficiently large z 



(29) 



MO) > 2 £ \Pv(*)]- 



From (|29), © it follows that 



nun^*), > imin{£ ($)],£ [£„($)]} = > ^M*)] - ^ > ^M*)L 

where we used that £[/j, v ($>)] > > - for sufficiently large z, since r < 1 — a. Combining the above we get 



< 8 



1 



< 



< 



z 



32^ + 32 



\/logi 



z i+(i-)/V%„(*)] 



' z 2 ^ + ° Z z l + (l-a)/2 



1 



r 2-r 



_ 32 logz +32 _ ^ 



z (3-a-r)/2 



< 72 



logz 



(30) 



since 2 — t > 1 — a and (3 — a — r)/2 > 1 — a, assuming sufficiently large z. ■ 

C Proof of Lemma 13.81 

Proof: We will derive our bound by approximating with the translated Poisson distribution, which is defined next. 

Definition C.l (|37|) We say that an integer random variable Y has a translated Poisson distribution with paremeters p and 
cr 2 and write 

L(Y)=TP(p,a 2 ) 

if L(Y — [p — a 2 \ ) = Poisson(a 2 + {/i — cr 2 }), where {/i — cr 2 } represents the fractional part of p — a 2 . 

The following lemma provides a bound for the total variation distance between two translated Poisson distributions with 
different parameters. 



Lemma C.2 (|[5]|) Let (i^^eR and erf, a 2 € M+ \ {0} be such that — u\\ < [p 2 — of J. ^ len 



The following lemma was proven in [ 16], Lemma 3.14, 



Lemma C.3 Let z > be some integer and {Zi} r i r l 1 , where m > z 13 , be any set of independent Bernoulli random variables 



with expectations E[Zi] £ 



L*°J i 

z ' 2 



Let hi = YhLi £[ z i\ and a- 2 = YhLi £[ z i](l - Tnen 



i=i 



< O 



Let Af v (9) := {i : 9i = v}, n v (9) = \J\f v (9)\. Conditioned on $ = 9, the distribution of T vA is the sum of n v {6) 
independent Bernoulli random variables {-^ijieA^Ce) with expectations £[Zi] — Pi, v (£y). Similarly, conditioned on $ = 9, 
the distribution of T t ,.i is the sum of n v (9) independent Bernoulli random variables {-^i}iG//„(9) w i tn expectations £[Zi\ = 
%,v{l%). Note that 

Y «[2i]=M»(fl) 



and, similarly, 



E * 

ieW„(0) 



Setting fix := fi v (9), fi 2 ■= V-v{9) and 



o\= Y £[Zi]{i-e[Zi]), 

tejv„(fl) 



of = ^ f Zj (l-£ 



we have from Lemma [C3l that 



£ Z<- TP (/*!,*?) 



< O 



(,--**) 



(31) 



TV 



E ^-TP 

ieA/"„(0) 



It remains to bound the total variation distance between the translated poisson distributions using Lemma 
of generality let us assume — of J < [/i 2 — o\\ . Note that 

z \ z I 2 z 



a\= Y £[Z i ](l-£{Z t })>n v (6)^-[l 



where the last inequality holds for values of z which are larger than some function of constant a. Also, 



?l-°l\< Y \£[Zi}(i-£[Zi})-£ 



Zi 



- E 
= E 

* E 



(l-£ 

(\ P iAK)-PiAK)\ + \pUO-pUO\) 

1 



using |p<,v(0 < 



(32) 



Without loss 



Using the above and Lemma |C21 we have that 



< 



< 



< 



Imi - M2I 




1 

+ — 


a \ 


2 

1 




Imi - M2I 


3n„(9) 


1 


Cl 






Imi - M2I 


HO(z-") + 


1 




1 z pisZl 

2 4 z 


iMi - Ma| 




( z -(«+/3-D). 



To bound the ratio l^ 1 ^' 2 ! we distinguish the following cases: 

• V3 logzv / 2- de P* /l T(«) v /]xf < ^-^'"rW Combining this inequality with O we get that 

\1\ < 2 1+depthT ^n v {e). 



Hence, 



I Mi -M2I < ^ + 



/log z 



Iff i 



/TogT 



v / 2 i+^p^T(i-) ni) (6i) 



= O 



' de P th T {„) N 

2 2 ylog z 



V3 log 2 V2- de P'^( t ') yjTf > i2- de P*M")|;r|: It follows that 



Hence, 



IMi - Ma I 



< 



0"! 



|T| < 12 2 dept/lT ^logz. 



|J| < I + ^Ml v /l2 2 de P t ^( t ') logz _ 







+ o( 









dc V th T {y) \ 

2 2 log 2 



z 2 



Combining the above, it follows that 



2MI / IM1-M2I . |o-f-cr|| + l 



0"! 



< O 



< O 



depth-p (v) 



O 



2 2 ^/logz 



l + Q 

2 2 



/ deptfe r (t,) \ 

o 2 z L|±! 0SZ + °^ a ) + o{z-w-*>) 



' depth T (v) \ 

2 2 ^/log z 



l + a 
Z 2 



' depth T (v) \ 

O I LJ° gZ +0(z-) + 0(z-^- 1 )). 



Combining the above with OTI ) and d32b we get 

/ depth T (v) \ 

2 2 v i°g z 



E E * 



< O 



1+0 
z 2 



Z 2 



/ d<:pth T (v) \ 

Q 2 I + J° SZ ) + 0(z-°) + Q(z-^)). 



rv 



D Concentration of the Leaf Experiments 



The following lemmas constitute the last piece of the puzzle and complete the proof of Lemma [331 They roughly state 
that, after the random walk in Stage 1 of the processes SAMPLING is performed, the experiments that will take place in Stage 
2 of the processes SAMPLING are similar with high probability. 
Proof of Lemma [37TOt Note that 

= ='■ n > 

where {£1^}; are independent random variables defined as 



Pi,v(i$), with probability 2- de P thT ^ 



0, with probability 1 - 2- de P thT ^ . 

We apply the following version of Chernoff/Hoeffding bounds to the random variables fl i := z 1 ~ a Vli 6 [0, 1]. 

Lemma D.l (Chernoff/Hoeffding) Let Z\, . . . , Z m be independent random variables with Z% G [0, 1], for all i. Then, if 
Z = Ya=i Z i and"/ G (0, 1), 

Pr[\Z-£[Z]\ >j£[Z]] < 2cxp(- 7 2 £'[Z]/3). 



Letting il = J2iei aD ^- a PPly m 8 tne above lemma with 7 



log z, it follows that 



Pr 



n -s[n ] > Jg[a']]ogz 



which in turn implies 
or, equivalently, 



Pr 



|fi - £[Q]\ > z^-Wy/SiSl] logz 



< 2Z- 1 / 3 , 
< 2z- 1 '\ 



Pr 



\n v m - £[^m\ > z {a - i)/2 V£ivv(mogz < iz- 1 ^ 



Similarly, it can be derived that 

Pr 



< 2z- l '\ 



Let us consider the joint probability space which makes $ = $ with probability 1; this space exists since as we observed 
above G(9) = G(0),V6. By a union bound for this space 



Pr 

which implies 



- £[^m\ > ^T§|V^M*)I V - £[p v &] 



< 4z^ 3 . 



G^e : \fi v {9) - £[[i v {<f>)}\ < ^v^A \fl v (9) -£{&(*)]] < ^§|v^S) > 1-4^ 1/3 



Proof of Lemma f3 ■ 1 1 1 Suppose that the random variables $ and $ are coupled so that, with probability 1, $ = $. Then 

ju„($) - = ^ Qi =: ft, 

where are independent random variables defined as 



iei 



'pi,v(K) ~ Pi,v(£* v ), with probability 2~ de P tflT ^ 



0, with probability 1 - 2- de P t,lT ^ . 

We apply Hoeffding's inequality to the random variables f2j. 



Lemma D.2 (Hoeffding's Inequality) Let X\, . . . , X n be independent random variables. Assume that, for all i, Pr[.Xj € 
[aj,fej]] — 1. Then, fort > 0: 



Pr 



5> 



> t 



< exp -=?n 



2f 2 



Applying the above lemma we get 



Pr[|Q-£ [fi]| >t] <2exp 



2t 2 



since, for all i e I, \pi. v {£* v ) ~ Pi,v(K)\ ^ \- Setting t = ^log z^f\I\\ we get 



Pr 



\n-s[Q}\ > v/iog^v^- 



< 2- 



1/2 ' 



Note that 



It follows from the above that 



which gives immediately that 



Pr 



|0| < 1 + Vbi^y^i 



> 1- 2- 



1/2 ' 



G ( 9 : - < - + ^^Vm ) > 1 - 



2 Z 



2 



Moreover, an easy application of Lemma ID . 1 1 gives 

G \0 : \n v (0) - 2- de P t ^M|X|| < V^l^V^^^^) - 1 ~ \ 



(33) 



Indeed, let T% = l$ i= „. Then n„($) = ^ and ^ E ie i T d = 2- depttlT ^ Applying Lemma ED with 7 = 



31ogz wp „ pt 

2 -de P th T ( l ,) | X | we get 



Pr 



iei Liex 



> y/3logz^2- de P th T(v)\J\ 



2 

< -, 

z 



which implies 



Pr 



> 1 - -• 

z 



